Anscombe’s Data

We can use R to look at the relationship between random variables. Here we will play with Anscombe’s data. First we import the data.

  anscombe <- read.csv("anscombe.csv")
  head(anscombe)

Base R has a function to plot a number of variables against each other.

  pairs(anscombe)

We note that the first column contains plots of \(x\) against \(a\), \(b\), and \(c\) and that the fifth row of last column contains a plot of \(X\) against \(d\). The relationships between the various \(y\) values and \(x\) appear to be different.

A look at the covariances and correlations may be informative.

  cov(anscombe)

##        xabc         ya          yb       yc          yd     xd
## xabc 11.000  5.5010000  5.50000000  5.49700  0.02000000 -4.400
## ya    5.501  4.1272691  3.09560909  1.93343  0.26806909 -2.003
## yb    5.500  3.0956091  4.12762909  2.42524 -0.05958091 -3.037
## yc    5.497  1.9334300  2.42524000  4.12262  0.09328000 -1.947
## yd    0.020  0.2680691 -0.05958091  0.09328  4.12324909  5.499
## xd   -4.400 -2.0030000 -3.03700000 -1.94700  5.49900000 11.000

  cor(anscombe)

##              xabc          ya          yb          yc           yd         xd
## xabc  1.000000000  0.81642052  0.81623651  0.81628674  0.002969709 -0.4000000
## ya    0.816420516  1.00000000  0.75000540  0.46871668  0.064982372 -0.2972715
## yb    0.816236506  0.75000540  1.00000000  0.58791933 -0.014442321 -0.4507110
## yc    0.816286739  0.46871668  0.58791933  1.00000000  0.022624662 -0.2891232
## yd    0.002969709  0.06498237 -0.01444232  0.02262466  1.000000000  0.8165214
## xd   -0.400000000 -0.29727146 -0.45071096 -0.28912321  0.816521437  1.0000000

We note that the covariances and correlations for the appropriate \(x\) with the \(a\), \(b\), \(c\), and \(d\) \(y\) values are, to within rounding error, the same.

  with(anscombe,{
         c(
           cor(xabc,ya),
           cor(xabc,yb),
           cor(xabc,yc),
           cor(xd,yd)
         )
       })

## [1] 0.8164205 0.8162365 0.8162867 0.8165214